27 research outputs found

    Caenorhabditis nomenclature

    Get PDF
    Genetic nomenclature allows the genetic features of an organism to be structured and described in a uniform and systematicway. Genetic features, including genes, variations (both natural and induced), and gene products, are assigned descriptorsthat inform on the nature of the feature. These nomenclature designations facilitate communication among researchers (in publications,presentations, and databases) to advance understanding of the biology of the genetic feature and the experimental utilizationof organisms that contain the genetic feature. The nomenclature system that is used for C. elegans was first employed by Sydney Brenner (1974) in his landmark description of the genetics of this model organism, and then substantially extended and modified in Horvitz et al., 1979. The gene, allele, and chromosome rearrangement nomenclature, described below, is an amalgamation of that from bacteria andyeast, with the rearrangement types from Drosophila. The nomenclature avoids standard words, subscripts, superscripts, and Greek letters and includes a hyphen (-) to separatethe gene name from gene number (distinct genes with similar phenotypes or molecular properties). As described by Jonathan Hodgkin, ā€˜the hyphen is about 1 mm in length in printed text and therefore symbolizes the 1 mm long wormā€™. These nomenclature propertiesmake C. elegans publications highly suitable for informatic text mining, as there is minimal ambiguity. From the founding of the CaenorhabditisGenetics Center (CGC) in 1979 until 1992, Don Riddle and Mark Edgley acted as the central repository for genetic nomenclature. Jonathan Hodgkin was nomenclature czar from 1992 through 2013; this was a pivotal period with the elucidation of the genome sequence of C. elegans, and later that of related nematodes, and the inception of WormBase. Thus, under the guidance of Hodgkin, the nomenclature system became a central feature of WormBase and the number and types of genetic features significantly expanded. The nomenclature system remains dynamic, with recentadditions including guidelines related to genome engineering, and continued reliance on the community for input. WormBase assigns specific identifying codes to each laboratory engaged in dedicated long-term genetic research on C. elegans. Each laboratory is assigned a laboratory/strain code for naming strains, and an allele code for naming genetic variation(e.g., mutations) and transgenes. These designations are assigned to the laboratory head/PI who is charged with supervisingtheir organization in laboratory databases and their associated biological reagents that are described on WormBase, in publications, and distributed to the scientific community on request. The laboratory/strain code is used: a) to identifythe originator of community-supplied information on WormBase, which, in addition to attribution, facilitates communicationbetween the community/curators and the originator if an issue related to the information should arise at a later date; andb) to provide a tracking code for activities at the CGC. The laboratory/strain designation consists of 2-3 uppercase letters while the allele designation has 1-3 lowercase letters.The final letter of a laboratory code should not be an ā€œOā€ or an ā€œIā€ so as not to be mistaken for the numbers ā€œ0ā€ or ā€œ1ā€ respectively.Additionally, allele designations should also not end with the letter ā€œlā€ which could also be mistaken for the number ā€œ1.ā€ These codes are listed at the CGC and in WormBase. Investigators generating strains, alleles, transgenes, and/or defining genes require these designations and should applyfor them at [email protected]. Information for several other nematode species, in addition to C. elegans, is curated at WormBase. All species are referred to by their Linnean binomial names (e.g,. Caenorhabditis elegans or C. elegans). Details of all the genomes available at WormBase and the degree of their curation can be found at www.wormbase.org/species/al

    WormBase - Annotating many nematode genomes

    Get PDF
    WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBaseā€™s role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE

    Automatic categorization of diverse experimental information in the bioscience literature

    Get PDF
    Background: Curation of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance. Results: We successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction. Conclusions: Our methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort

    WormBase - Annotating many nematode genomes

    Get PDF
    WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBaseā€™s role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE

    WormBase: a comprehensive resource for nematode research

    Get PDF
    WormBase (http://www.wormbase.org) is a central data repository for nematode biology. Initially created as a service to the Caenorhabditis elegans research field, WormBase has evolved into a powerful research tool in its own right. In the past 2 years, we expanded WormBase to include the complete genomic sequence, gene predictions and orthology assignments from a range of related nematodes. This comparative data enrich the C. elegans data with improved gene predictions and a better understanding of gene function. In turn, they bring the wealth of experimental knowledge of C. elegans to other systems of medical and agricultural importance. Here, we describe new species and data types now available at WormBase. In addition, we detail enhancements to our curatorial pipeline and website infrastructure to accommodate new genomes and an extensive user base

    The EMBL Nucleotide Sequence Database

    Get PDF
    The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl), maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK, is a comprehensive collection of nucleotide sequences and annotation from available public sources. The database is part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged daily between the collaborating institutes to achieve swift synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation (TPA) and alignments. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. New and updated data records are distributed daily and the whole EMBL Nucleotide Sequence Database is released four times a year. Access to the sequence data is provided via ftp and several WWW interfaces. With the web-based Sequence Retrieval System (SRS) it is also possible to link nucleotide data to other specialist molecular biology databases maintained at the EBI. Other tools are available for sequence similarity searching (e.g. FASTA and BLAST). Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data

    WormBase: better software, richer content

    Get PDF
    WormBase (), the public database for genomics and biology of Caenorhabditis elegans, has been restructured for stronger performance and expanded for richer biological content. Performance was improved by accelerating the loading of central data pages such as the omnibus Gene page, by rationalizing internal data structures and software for greater portability, and by making the Genome Browser highly customizable in how it views and exports genomic subsequences. Arbitrarily complex, user-specified queries are now possible through Textpresso (for all available literature) and through WormMart (for most genomic data). Biological content was enriched by reconciling all available cDNA and expressed sequence tag data with gene predictions, clarifying single nucleotide polymorphism and RNAi sites, and summarizing known functions for most genes studied in this organism

    WormBase 2016: expanding to enable helminth genomic research

    Get PDF
    WormBase (www.wormbase.org) is a central repository for research data on the biology, genetics and genomics of Caenorhabditis elegans and other nematodes. The project has evolved from its original remit to collect and integrate all data for a single species, and now extends to numerous nematodes, ranging from evolutionary comparators of C. elegans to parasitic species that threaten plant, animal and human health. Research activity using C. elegans as a model system is as vibrant as ever, and we have created new tools for community curation in response to the ever-increasing volume and complexity of data. To better allow users to navigate their way through these data, we have made a number of improvements to our main website, including new tools for browsing genomic features and ontology annotations. Finally, we have developed a new portal for parasitic worm genomes. WormBase ParaSite (parasite.wormbase.org) contains all publicly available nematode and platyhelminth annotated genome sequences, and is designed specifically to support helminth genomic research

    WormBase: new content and better access

    Get PDF
    WormBase (), a model organism database for Caenorhabditis elegans and other related nematodes, continues to evolve and expand. Over the past year WormBase has added new data on C.elegans, including data on classical genetics, cell biology and functional genomics; expanded the annotation of closely related nematodes with a new genome browser for Caenorhabditis remanei; and deployed new hardware for stronger performance. Several existing datasets including phenotype descriptions and RNAi experiments have seen a large increase in new content. New datasets such as the C.remanei draft assembly and annotations, the Vancouver Fosmid library and TEC-RED 5ā€² end sites are now available as well. Access to and searching WormBase has become more dependable and flexible via multiple mirror sites and indexing through Google
    corecore